A Tool For Collecting Domain Dependent Sortal Constraints From Corpora

نویسندگان

  • François Andry
  • Jean Mark Gawron
  • John Dowding
  • Robert C. Moore
چکیده

In this paper, we describe tile results of using this semi-automatic tool to port the (',e, udli i NL system to the ATIS domahi , a (lomltin tha t ( ienl ini had ah'eady been ported to, arid for which it ]lad achiew~,d high perl'orluance ~ttld gi'al ' li l l iatical coverage using hand-wri t ten sortal constraints. Chossing a known domain, rather than a new one, allowed us to compare tile performance of tile derived sorts to the hand-wri t ten ones, holding the domain, g rammar , and lexicon constant. It also allowed us to evahlate the selni-~ultoma.tically obtained cown'age using the ewduation tools provided for the A'I?IS corpus.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic sortal Interpretation of German Nominalisations with -ung Towards using underspecified Representations in Corpora

In this paper we present work on using dependency structures in a process of automatic sortal interpretation of German nominalisations with -ung, such as Messung (‘measurement’) or Zählung (‘count’). Many such -ung nominalisations are ambiguous with respect to their sortal interpretation (cf. Ehrich and Rapp (2000) who lean heavily on McCawley (1968) and Lakoff (1972) for the notion of sortal a...

متن کامل

Applying Constraints derived from the Context in the process of Incremental Sortal Specification of German ung-Nominalizations

Many German nominalizations with the affix -ung are sortally ambiguous. Within a sentence, lexico-semantic and/or syntactic phenomena may support disambiguation. The sortal interpretation of a nominalization may vary depending on the underlying syntactic analysis of one and the same, syntactically ambiguous sentence. We model the process of sortal disambiguation as a constraint-based incrementa...

متن کامل

Automatic Processing of Large Corpora for the Resolution of Anaphora References

Manual acquisition of semantic constraints in broad domains is very expensive. This paper presents an automatic scheme for collecting statistics on cooccurrence patterns in a large corpus. To a large extent, these statistics reflect, semantic constraints and thus are used to disambiguate anaphora references and syntactic ambiguities. The scherne was implemented by gathering statistics on the ou...

متن کامل

Dactylize: Automatically Collecting Piano Fingering Data from Performance

A prototype system, dubbed “Dactylize,” for collecting fingering data automatically from actual piano performances is described. The solution promises to be an economical and accurate tool for developing corpora related to piano fingering. Evaluation of an early prototype suggests accuracy over 99% at rates up to 12.5 notes per second.

متن کامل

TweetCaT: a tool for building Twitter corpora of smaller languages

This paper presents TweetCaT, an open-source Python tool for building Twitter corpora that was designed for smaller languages. Using the Twitter search API and a set of seed terms, the tool identifies users tweeting in the language of interest together with their friends and followers. By running the tool for 235 days we tested it on the task of collecting two monitor corpora, one for Croatian ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994